1. Introduction
Since the earliest history, various institutions (e.g., governments and private companies alike) have recorded their actions and transactions. Subsequent generations have used these archival records to understand the history of the institution, the national heritage, and the human journey. These records may be essential to support the efficiency of the institution, to protect the rights of individuals and businesses, and/or to ensure that the private company or public corporation/company is accountable to its employees/shareholders and/or that the Government is accountable to its citizens.
With the advance of technology into a dynamic and unpredictable digital era, evidence of the acts and facts of institutions and the government and our national heritage are at risk of being irrecoverably lost. The challenge is pressing—as time moves forward and technologies become obsolete, the risks of loss increase. It will be appreciated that a need has developed in the art to develop an electronic records archives system and method especially, but not only, for the National Archives and Records Administration (NARA) in a system known as Electronic Records Archives (ERA), to resolve this growing problem, in a way that is substantially obsolescence-proof and policy neutral. While embodiments of the invention will be described with respect to its application for safeguarding government records, the described embodiments are not limited to archives systems applications nor to governmental applications and can also be applied to other large scale storage applications, in addition to archives systems, and for businesses, charitable (e.g., non-profit) and other institutions, and entities.
One aspect of the invention is directed to an architecture that will support operational, functional, physical, and interface changes as they occur. In one example, a suite of commercial off-the-shelf (COTS) hardware and software products has been selected to implement and deploy an embodiment of the invention in the ERA, but the inventive architecture is not limited to these products. The architecture facilitates seamless COTS product replacement without negatively impacting the ERA system.
1.1 Understanding the Problem
Another aspect of the ERA is to preserve and to provide ready access to authentic electronic records of enduring value.
In one embodiment, the ERA supports and flows from NARA's mission to ensure “for the Citizen and the Public Servant, for the President and the Congress and the Courts, ready access to essential evidence.” This mission facilitates the exchange of vital ideas and information that sustains the United States of America. NARA is responsible to the American people as the custodian of a diverse and expanding array of evidence of America's culture and heritage, of the actions taken by public servants on behalf of American citizens, and of the rights of American citizens. The core of NARA's mission is that this essential evidence must be identified, preserved, and made available for as long as authentic records are needed—regardless of form.
The creation and use of an unprecedented and increasing volume of Federal electronic records—in a wide variety of formats, using evolving technologies—poses a problem that the ERA must solve. An aspect of the invention involves an integrated ERA solution supporting NARA's evolving business processes to identify, preserve, and make available authentic, electronic records of enduring value—for as long as they are needed.
In another embodiment, the ERA can be used to store, process, and/or disseminate a private institution's records. That is, in an embodiment, the ERA may store records pertaining to a private institution or association, and/or the ERA may be used by a first entity to store the records of a second entity. System solutions, no matter how elegant, may be integrated with the institutional culture and organizational processes of the users.
1.1.1 NARA's Evolving Business Processes
Since 1934, NARA has developed effective and innovative processes to manage the records created or received, maintained or used, and destroyed or preserved in the course of public business transacted throughout the Federal Government. NARA played a role in developing this records lifecycle concept and related business processes to ensure long-term preservation of, and access to, authentic archival records. NARA also has been instrumental in developing the archival concept of an authentic record that consists of four fundamental attributes: content, structure, context, and presentation.
NARA has been managing electronic records of archival value since 1968, longer than almost anyone in the world. Despite this long history, the diverse formats and expanding volume of current electronic records pose new challenges and opportunities for NARA as it seeks to identify records of enduring value, preserve these records as vital evidence of our nation's past, and make these records accessible to citizens and public servants in accordance with statutory requirements.
The ERA should support, and may affect, the institution's (e.g., NARA's) evolving business processes. These business processes mirror the records lifecycle and are embodied in the agency's statutory authority:                Providing guidance to Federal Agencies regarding records creation and records management;        Scheduling records for appropriate disposition;        Storing and preserving records of enduring value; and/or        Making records available in accordance with statutory and regulatory provisions.        
Within this lifecycle framework, the ERA solution provides an integrated and automated capability to manage electronic records from: the identification and capture of records of enduring value; through the storage, preservation, and description of the records; to access control and retrieval functions.
Developing the ERA involves far more than just warehousing data. For example, the archival mission is to identify, preserve, and make available records of enduring value, regardless of form. This three-part archival mission is the core of the Open Archival Information System (OAIS) Reference Model, expressed as ingest, archival storage, and access. Thus, one ERA solution is built around the generic OAIS Reference Model (presented in FIG. 1), which supports these core archival functions through data management, administration, and preservation planning.
The ERA may coordinate with the front-end activities of the creation, use, and maintenance of electronic records by Federal officials. This may be accomplished through the implementation of disposition agreements for electronic records and the development of templates or schemas that define the content, context, structure, and presentation of electronic records along with lifecycle data referring to these records.
The ERA solution may complement NARA's other activities and priorities, e.g., by improving the interaction between NARA staff and their customers (in the areas of scheduling, transfer, accessioning, verification, preservation, review and redaction, and/or ultimately the ease of finding and retrieving electronic records).
1.1.2 Encompassing a Broad Scope of Records
Like NARA itself, the scope of ERA includes the management of electronic and non-electronic records, permanent and temporary records, and records transferred from Federal entities as well as those donated by individuals or organizations outside of the government. Each type of record is described and/or defined below.
ERA and Non-Electronic Records: Although the focus of ERA is on preserving and providing access to authentic electronic records of enduring value, the system's scope also includes, for example, management of specific lifecycle activities for non-electronic records. ERA will support a set of lifecycle management processes (such as those used for NARA) for appraisal, scheduling, disposition, transfer, accessioning, and description of both electronic and non-electronic records. A common systems approach to appraisal and scheduling through ERA will improve the efficiency of such tasks for non-electronic records and help ensure that permanent electronic records are identified as early as possible within the records lifecycle. This same common approach will automate aspects of the disposition, transfer, accessioning, and description processes for all types of records that will result in significant workflow efficiencies. Archivists, researchers, and other users may realize benefits by having descriptions of both electronic and non-electronic records available together in a powerful, universal catalog of holdings. In an embodiment, some of ERA's capabilities regarding non-electronic records may come from subsuming the functionality of legacy systems such the Archival Research Catalog (ARC). To effectively manage lifecycle data for all types of records, in certain embodiments, ERA also may maintain data interchange (but not subsume) other legacy systems and likely future systems related to non-electronic records.
Permanent and Temporary Records: There is a fundamental archival distinction between records of enduring historic value, such as those that NARA must retain forever (e.g., permanent records) and those records that a government must retain for a finite period of time to conduct ongoing business, meet statutory and regulatory requirements, or protect rights and interests (e.g., temporary records).
For a particular record series from the US Federal Government, NARA identifies these distinctions during the record appraisal and scheduling processes and they are reflected in NARA-approved disposition agreements and instructions. Specific records are actually categorized as permanent or temporary during the disposition and accessioning processes. NARA takes physical custody of all permanent records and some temporary records, in accordance with approved disposition agreements and instructions. While all temporary records are eventually destroyed, NARA ultimately acquires legal (in addition to physical) custody over all permanent records.
ERA may address the distinction between permanent and temporary records at various stages of the records life-cycle. ERA may facilitate an organization's records appraisal and scheduling processes where archivists and transferring entities may use the system to clearly identify records as either permanent or temporary in connection with the development and approval of disposition agreements and instructions. The ERA may use this disposition information in association with the templates to recognize the distinctions between permanent and temporary records upon ingest and manage these records within the system accordingly.
For permanent records this may involve transformation to persistent formats or use of enhanced preservation techniques to insure their preservation and accessibility forever. This also may apply to temporary records of long-term value, such as, for example, medical records. For example, any record that must be retained beyond the life of its originating system may need one or more “transformations” that maintain the authenticity of the records. For temporary records, NARA's Records Center Program (RCP) is exploring offering its customers an ERA service to ingest and store long-term temporary records in persistent formats. To the degree that the RCP opts to facilitate their customers' access to the ERA for appropriate preservation of long-term temporary electronic records, this same coordination relationship with transferring entities through the RCP will allow NARA to effectively capture permanent electronic records earlier in the records lifecycle. In the end, ERA may also provide for the ultimate destruction of temporary electronic records.
ERA and Donated Materials: In addition to federal records, NARA also receives and accesses donated archival materials. Such donated collections comprise a significant percentage of NARA's Presidential Library holdings, for example. ERA may manage donated electronic records in accordance with deeds of gift of deposit agreements which, when associated with templates, may ensure that these records are properly preserved and made available to users. Although donated materials may involve unusual disposition instructions or access restrictions, ERA should be flexible enough to adapt to these requirements. Since individuals or institutions donating materials to NARA are likely to be less familiar with ERA than federal transferring entities, the system may also include guidance and tools to help donors and the NARA appraisal staff working with them insure proper ingest, preservation, dissemination of donated materials.
1.1.3 Meeting the Needs of Users
Systems are designed to facilitate the work of users, and not the other way around. One or more of the following illustrative classes of users may interact with the ERA: transferring entity; appraiser; records processor; preserver; access reviewer; consumer; administrative user; and/or a manager. The ERA may take into account data security, business process re-engineering, and/or systems development and integration. The ERA solution also may provide easy access to the tools the users need to process and use electronic records holdings efficiently.
1.2 Mitigating Risks and Meeting Challenges
NARA must meet challenges relating to archiving massive amounts of information, or the American people risk losing essential evidence that is only available in the form of electronic federal records. But beyond mitigating substantial risks, the ERA affords such opportunities as:                Using digital communication tools, such as the Internet, to make electronic records holdings, such as NARA's, available beyond the research room walls in offices, schools, and homes throughout the country and around the world;        Allowing users to take advantage of the information-processing efficiencies and capabilities afforded by electronic records;        Increasing the return on the public's investment by demonstrating technological solutions to electronic records problems that will be applied throughout our digital society in a wide variety of institutional settings; and/or        Developing tools for archivists to perform their functions more efficiently.        
According to one aspect of the invention, there is provided a system for ingesting, storing, and/or disseminating information. The system may include an ingest module, a storage module, and a dissemination module that may be accessed by a user via one or more portals.
In an aspect of certain embodiments, there is provided a system and method for automatically identifying, preserving, and disseminating archived materials. The system/method may include extreme scale archives storage architecture with redundancy or at least survivability, suitable for the evolution from terabytes to exabytes, etc.
In another aspect of certain embodiments, there is provided an electronic records archives (ERA), comprising an ingest module to accept a file and/or a record, a storage module to associate the file or record with information and/or instructions for disposition, and an access or dissemination module to allow selected access to the file or record. The ingest module may include structure and/or a program to create a template to capture content, context, structure, and/or presentation of the record or file. The storage module may include structure or a program to preserve authenticity of the file or record over time, and/or to preserve the physical access to the record or file over time. The access module may include structure and/or a program to provide a user with the ability to view/render the record or file over time, to control access to restricted records, to redact restricted or classified records, and/or to provide access to an increasing number of users anywhere at any time.
The ingest module may include structure or a program to auto-generate a description of the file or record. Each record may be transformed, e.g., using a framework that wraps and computerizes the record in a self-describing format with appropriate metadata to represent information in the template.
The ingest module, may include structure or a program to process a Submission Information Package (SIP), and/or an Archival Information Package (AIP). The access module may include structure or a program to process a Dissemination Information Packages (DIP).
Independent aspects of the invention may include the ingest module alone or one or more aspects thereof, the storage module alone or one or more aspects thereof; and/or the access module alone or one or more aspects thereof.
Still further aspects of the invention relate to methods for carrying out one or more functions of the ERA or components thereof (ingest module, storage module, and/or access module).
1.3 Archival Problems in General and Drawbacks of Existing Solutions
It is not enough just to preserve electronic records. Now and into the future, archivists must be able to attest to the authenticity of the preserved records to protect the rights and interests of various constituents. If records cannot be certified as authentic, there is a risk of unraveling the trust system upon which society is based
In the words of Jeff Rothenberg of the Rand Corporation:                The relationship between digital preservation and authenticity stems from the fact that meaningful preservation implies the usability of that which is preserved. That is, the goal of preservation is to allow future users to retrieve, access, decipher, view, interpret, understand, and experience documents, data, and records in meaningful and valid (that is authentic) ways. An informational entity that is “preserved” without being usable in a meaningful and valid way has not been meaningfully preserved, i.e., has not been preserved at all.        The difficulty of defining a viable digital preservation strategy is partly the result of our failing to understand and appreciate the authenticity issues surrounding digital informational entities and the implications of these issues for potential technical solutions to the digital preservation problem. (See Jeff Rothenberg, Preserving Authentic Digital Information,” in Authenticity in a Digital Environment, May 2000. Council on Library and Information Resources, pages 51-68. Available at: www.clir.org/pubs/abstract/pub92abst.html.)        
In order to establish a common understanding, it is important to clarify four key concepts and the relationships among them—namely, reliability, authenticity, authentication, and trustworthiness.
1.3.1 Reliability
The InterPARES Project, an international collaboration researching the preservation of electronic records, defined reliability and authenticity. These definitions, in turn, have been adopted by most subsequent research projects and initiatives. A reliable record stands for the facts it contains—the record's content can be trusted. The reliability of a record depends upon, for example the completeness of the record's form, the control exercised over the process of creation, etc.
A reliable record has authority—that is, there is knowledge of who created the record, when it was created, how it was created, and the purpose for which it was created. Reliability generally is more the concern of the record's creator than its preserver. In some ways, reliability is a “given” (e.g., must be assumed) before records ever reach the electronic archives. Although unreliable records generally cannot be made reliable, the issue of reliability cannot be ignored.
In this vein, there are two options for establishing a policy related to the reliability of submitted records. First, all records submitted by institutions may be accepted. In this case it will be assumed that the records are reliable because the providers say so. Second, reliability criteria that providers must meet before records will be accepted may be established. The criteria may deal with completeness of the record, procedural controls over the creation of the records, etc. For example, the Authenticity Task Force of the InterPARES Project has established a set of criteria that may be used as a basis for setting such criteria.
1.3.2 Authenticity
The InterPARES Project defines an authentic record as “a record that is what it purports to be and is free from tampering or corruption.” Broadly considered, the authenticity of records depends upon actions by both the Records Creator and the Records Preserver. In particular, the Records Creator generally is concerned with the “truth” of the original record, including, for example, the mode, form, and/or state of transmission of the records as drafts, originals, and/or copies. The Records Preserver generally is concerned with the manner of the maintenance, preservation, and custody of the records. The mode of transmission of the record generally is the means used to transmit a record across space and time, whereas the form of transmission generally is the physical carrier on which a record is received (e.g., paper, film, disk, magnetic tape, etc.).
For a record to be authentic (meaning that the record remains reliable over time), its preservation should occur under strict controls. Some questions that may be used when determining whether a record is authentic follow:
When was a record copied or migrated?
Who did the copying or migration?
How did the copying or migration take place?
What quality control processes governed the copying or migration?
“Trust” and “truthfulness” have become key aspects of an authentic record. Because conformity with “the truth” is a judgment, a determination of authenticity likewise will be a judgment. For example, though it is necessary to have an accurate bit stream, such a bit stream is not sufficient to have an “authentic record.” It is this broad sense of authenticity that must be addressed. Indeed, authenticity includes issues such as, for example, integrity, completeness, correctness, validity, faithfulness to an original, meaningfulness, and suitability for an intended purpose.
1.3.3 Authentication
Although “authenticity” and “authentication” often are used together, they sometimes may be thought of as quite different concepts. By way of example and without limitation, authentication sometimes may be thought of as being a narrower term than authenticity. For example, authentication generally is a declaration about a record at a given time. The rules governing authentication may be established by legislation or other policy. Authentication generally means that the custodian of a record issues a statement saying that a record is authentic at this time. Authentication thus may be thought of as being external to the record itself and is temporary (as opposed to authenticity, which is a quality of the record that is to be constantly protected over the long-term). An “authenticated record” only can be as reliable as when the record was first issued by its creator. It certain embodiments, it may be useful to authenticate (e.g., certify) a record from time-to-time to indicate that authenticity is being maintained.
1.3.4 Trustworthiness
The Minnesota Historical Society has defined the concept of a “trustworthy information system.” As stated in the TIS Handbook, “Trustworthiness refers to an information system's accountability and its ability to produce reliable and authentic information and records.” In an embodiment, documentation and metadata are a part of a trustworthy information system, as they are useful in proper data creation, storage, retrieval, modification, retention, destruction, and the like.
Ensuring the authenticity over time of digital records is a major concern that has at least two aspects. A first aspect relates to checking and certifying data integrity (e.g., associated with technical processes such as integrity checking, certification, digital watermarking, steganography, and/or user and authentication protocols). A second aspect relates to identifying the intellectual qualities of information that make it authentic (e.g., associated with legal, cultural, and/or philosophical concepts such as trustworthiness and completeness).
According to Anne Gilliland-Swetland, “Preserving knowledge is more complex than preserving only media or content. It is about preserving the intellectual integrity of information objects, including capturing information about the various contexts within which information is created, organized, and used; organic relationships with other information objects; and characteristics that provide meaning and evidential value.” Accordingly, one feature of certain example embodiments relates to preserving knowledge and making it available. This complex task involves both technical and intellectual challenges.
Unfortunately, commercial systems for electronic archiving are built around storage and/or workflow technologies but do not provide the highest levels of authenticity support over indefinite periods of time. Moreover, commercial systems also tend to target archival needs within an enterprise and sometimes for compliance with targeted government regulations, such as Sarbanes Oxley, whereas a complete archives system (such as NARA) must accept records and other associated electronic assets (e.g., administrative information about the records) from other enterprises and has more stringent archival requirements. For example, as the custodian of the nation's archived electronic assets, NARA has to support basic rights of citizens and obligations of the government, such as military pensions and patents, which lead, for example, to indefinite retention requirements. Also, current electronic records archives systems and processes are manually intensive and do not provide comprehensive support for electronic records authenticity.
For example, it is noted that there are current commercial off-the-shelf (COTS) products that provide some elements of authenticity, but not all elements. EMC's Documentum and Centera products are examples. Certain example systems have implemented Documentum for forms (e.g., entry), workflow infrastructure, and content management of some data (e.g., business objects). Centera is a storage system that provides protection and some metadata and search capabilities, but it does not provide processes for authenticity. In general, COTS products would address specific regulatory requirements, such as, for example, Sarbanes Oxley, if anything, which target commercial business, rather than more stringent needs (e.g., of NARA) that the drive innovative solution of the example embodiments.
Thus, it will be appreciated that there is a need in the art for improved systems and/or methods that is/are scalable essentially without limitation for establishing and maintaining comprehensive authenticity of electronic records over an indefinite period of time in a substantially obsolescence-proof manner.
According to certain example embodiments, a system for establishing and maintaining authenticity of a plurality of records and/or documentary materials to be persisted in an electronic archives system is provided. Safeguarding programmed logic circuitry may be configured to safeguard each said record and/or documentary material throughout its entire lifecycle by monitoring and recording both intended changes to each said record and/or documentary material and its corresponding status, as well as unintended changes to each said record and/or documentary material. Extracting and preserving programmed logic circuitry may be configured to extract and preserve context and structure associated with each said record and/or documentary material. Custody programmed logic circuitry may be configured to establish and preserve substantially uninterrupted proof-of-custody including at least a source for each said record and/or documentary material throughout its entire lifecycle. Essential characteristic programmed logic circuitry may be configured to capture and preserve essential characteristics of each said record and/or documentary material throughout its lifecycle in dependence on one or more changeable definitions of essential characteristic. At least one storage location may be configured to store the plurality of records and/or documentary materials and all preserved information. The archives system may be scalable essentially without limitation. The authenticity of the plurality of records and/or documentary materials may be comprehensively storable and maintainable over an indefinite period of time in a substantially obsolescence-proof manner despite changeability of the records and/or documentary materials, record and/or documentary material custody, and/or essential characteristic definitions.
According to certain other example embodiments, a computer-implemented method tangibly embodied by at least instructions stored on a computer-readable storage medium for establishing and maintaining authenticity of a plurality of records and/or documentary materials to be persisted in an electronic archives system is provided. Each record and/or documentary material may be safeguarded throughout its entire lifecycle by monitoring and recording both intended changes to each said record and/or documentary material and its corresponding status, as well as unintended changes to each said record and/or documentary material. Context and structure associated with each said record and/or documentary material may be extracted and preserved. Substantially uninterrupted proof-of-custody including at least a source may be established and preserved for each said record and/or documentary material throughout its entire lifecycle. Essential characteristics of each said record and/or documentary material may be captured and preserved throughout its lifecycle in dependence on one or more changeable definitions of essential characteristic. The plurality of record and/or documentary material and all preserved information may be stored. The archives system may be scalable essentially without limitation. The authenticity of the plurality of record and/or documentary material may be comprehensively storable and maintainable over an indefinite period of time in a substantially obsolescence-proof manner despite changeability of the records and/or documentary materials, record and/or documentary material custody, and/or essential characteristic definitions.
According to still other example embodiments, a computer-implemented method tangibly embodied by at least instructions stored on a computer-readable storage medium for establishing and maintaining authenticity of a plurality of records and/or documentary materials to be persisted in an electronic archives system is provided. Transfer media from a transferring entity may be inspected to ensure that said transfer media contains at least one record and/or documentary material to be ingested. The at least one record and/or documentary material to be ingested may be stored in a temporary storage location. That the transfer media is mounted for upload into the system may be ensured. At least one security and/or integrity check may be performed on the transfer media. At least one validation check may be performed on the at least one record's and/or documentary material's bit-stream. The at least one record and/or documentary material may be stored to at least one managed storage location. Any outstanding verification issues with the transferring entity may be resolved. Necessary metadata for the at least one record's and/or documentary material's lifecycle may be persisted. The archives system may be scalable essentially without limitation. The authenticity of the plurality of records and/or documentary materials may be comprehensively storable and maintainable over an indefinite period of time in a substantially obsolescence-proof manner despite changeability of the records and/or documentary materials, record and/or documentary material custody, and/or essential characteristic definitions.
It will be appreciated that these techniques may be applied to records, assets, and/or documentary materials. It also will be appreciated that documentary materials may encompass a variety of different items. For example, in certain embodiments, documentary materials may be considered a collective term for records, nonrecord materials, and/or personal papers, that refers to all media on which information is recorded, regardless of the nature of the medium or the method or circumstances of recording. In certain other embodiments, documentary materials may include, for example, records (e.g., temporary and/or permanent), non-record material, personal papers or artifacts that refer to all media containing recorded information, regardless of the nature of the media or the method(s) or circumstance(s) of recording. In still other embodiments, documentary materials may be comprised of electronic information on physical media or paper records that are shipped to the archives in containers (e.g. box, envelope, etc), and those documentary materials that include electronic information may be transmitted via HTTPS or SFTP and divided into virtual electronic containers by the system. This need not be a user activity, but instead may be performed by the packaging tool as an aid to optimize transmission via electronic means.
It will be appreciated that as used herein, the term “subroutine” is broad enough to encompass any suitable combination of hardware, software, and any other form of programmed logic circuitry (which itself may be any suitable combination of hardware, software, firmware, or the like) capable of accomplishing a specified function. It also will be appreciated that the above-described embodiments, and the elements thereof, may be used alone or in various combinations to realize yet further embodiments.
Other aspects, features, and advantages of this invention will become apparent from the following detailed description when taken in conjunction with the accompanying drawings, which are a part of this disclosure and which illustrate, by way of example, principles of this invention.